A large amount of transaction data containing associations betweenindividuals and sensitive information flows everyday into data stores. Examplesinclude web queries, credit card transactions, medical exam records, transitdatabase records. The serial release of these data to partner institutions ordata analysis centers is a common situation. In this paper we show that, inmost domains, correlations among sensitive values associated to the sameindividuals in different releases can be easily mined, and used to violateusers' privacy by adversaries observing multiple data releases. We provide aformal model for privacy attacks based on this sequential background knowledge,as well as on background knowledge on the probability distribution of sensitivevalues over different individuals. We show how sequential background knowledgecan be actually obtained by an adversary, and used to identify with highconfidence the sensitive values associated with an individual. A defensealgorithm based on Jensen-Shannon divergence is proposed, and extensiveexperiments show the superiority of the proposed technique with respect toother applicable solutions. To the best of our knowledge, this is the firstwork that systematically investigates the role of sequential backgroundknowledge in serial release of transaction data.
展开▼